1 Basic Usage

Basic usage is following step;

  1. Create replay buffer (ReplayBuffer.__init__)
  2. Add transitions (ReplayBuffer.add)
    1. Reset at episode end (ReplayBuffer.on_episode_end)
  3. Sample transitions (ReplayBuffer.sample)

2 Example Code

Here is a simple example for storing standard environment (aka. obs, act, rew, next_obs, and done).

from cpprb import ReplayBuffer

buffer_size = 256
obs_shape = 3
act_dim = 1
rb = ReplayBuffer(buffer_size,
                  env_dict ={"obs": {"shape": obs_shape},
                             "act": {"shape": act_dim},
                             "rew": {},
                             "next_obs": {"shape": obs_shape},
                             "done": {}})

obs = np.ones(shape=(obs_shape))
act = np.ones(shape=(act_dim))
rew = 0
next_obs = np.ones(shape=(obs_shape))
done = 0

for i in range(500):

    if done:
        # Together with resetting environment, call ReplayBuffer.on_episode_end()

batch_size = 32
sample = rb.sample(batch_size)
# sample is a dictionary whose keys are 'obs', 'act', 'rew', 'next_obs', and 'done'

3 Construction Parameters

(See also API reference)

Name Type Optional Discription
size int No Buffer size
env_dict dict Yes (but unusable) Environment definition (See here)
next_of str or array-like of str Yes Memory compression (See here)
stack_compress str or array-like of str Yes Memory compression (See here)
default_dtype numpy.dtype Yes Fall back data type
Nstep dict Yes Nstep configuration (See here)
mmap_prefix str Yes mmap file prefix (See here)

4 Notes

Flexible environment values are defined by env_dict when buffer creation. The detail is described at document.

Since stored values have flexible name, you have to pass to ReplayBuffer.add member by keyword.